Intuit | Data Engineer 2 Interview Experience - Crafting Scalable Data Solutions



Round 1 – Data Structures, Algorithms, and Big Data

This round focused on core DSA fundamentals along with conceptual understanding of Big Data and Spark internals.

DSA Questions

One problem involved computing the LCM of two numbers in Python. The expected approach was to use the GCD-based formula rather than brute force, highlighting awareness of time complexity concerns for large inputs.

Another question focused on the Longest Palindromic Subsequence. The correct solution required dynamic programming using either a top-down memoized approach or a bottom-up two-dimensional DP table. The interviewer specifically tested whether candidates could distinguish between palindromic subsequence and palindromic substring problems.

Big Data Concepts

A significant portion of this round focused on Spark 2 to Spark 3 migration topics. Discussion areas included Adaptive Query Execution, shuffle optimizations, dynamic partition pruning, and Catalyst optimizer improvements. The interviewer expected candidates to explain not just what changed in Spark 3, but why those changes improve performance, ideally using practical examples involving joins or aggregations.

Round 2 – Craft Demo (Hands-on Proof of Concept)

This round was a take-home assignment with two to three days to design, implement, and present a real-world data pipeline combining batch and streaming workloads.

Expected Deliverables

Candidates were expected to present a complete architecture diagram including components such as Kafka, Spark, Delta Lake, Snowflake, orchestration tools, and monitoring layers. Clear data models for raw, staging, transformed, and analytics layers were required, along with a clean code repository containing modular code, documentation, and unit tests.

Strong emphasis was placed on design justifications. Candidates were expected to explain trade-offs such as Parquet versus Avro, batch versus streaming, and storage format choices. Handling schema evolution, late-arriving data, and exactly-once guarantees in streaming pipelines was a key evaluation criterion.

Round 3 – Assessor Round

This round focused on implementation depth and low-level Spark understanding.

One major discussion involved implementing Slowly Changing Dimension Type 2 in PySpark. The expected approach included using window functions to identify latest records, joining incoming data with existing datasets to detect changes, and using Delta Lake MERGE operations for insert and update logic. Candidates were expected to clearly explain how new records, updated records, and unchanged records were handled.

An additional Spark coding problem was based on RDD transformations. The interviewer evaluated understanding of mapPartitions, reduceByKey, and groupByKey, along with the ability to reason about shuffle behavior, caching strategies, and parallelism tuning using repartition and coalesce.

Round 4 – Team Member Round

This round involved deep dives into Big Data ecosystem components and real-world usage scenarios.

Topics included Delta Lake internals such as time travel, compaction strategies using OPTIMIZE and VACUUM, and ACID transaction handling via log files. HDFS concepts like block size tuning, replication factor, and data locality were discussed in detail. Kafka discussions covered offset management strategies, delivery semantics, and partitioning techniques to maintain message ordering.

Candidates were expected to clearly articulate fault tolerance strategies, replay mechanisms, and data lineage in streaming architectures, often through whiteboarding or verbal walkthroughs.

Round 5 – Hiring Manager Round

The final round focused on project deep dives and behavioral evaluation.

Candidates were asked to walk through one or two end-to-end projects, explaining architectural decisions, scalability challenges, query tuning strategies, and trade-offs between real-time and batch processing. Behavioral questions focused on decision-making under pressure, handling ambiguity, and collaboration with analysts and data scientists.

Clear, data-driven impact metrics were strongly encouraged, such as reductions in query latency, improvements in pipeline SLAs, or cost optimizations achieved through design changes.

Key Takeaways from the Intuit Interview

Strong PySpark skills are essential, especially with DataFrame APIs, joins, and Delta Lake operations. Understanding Spark 3 internals such as Adaptive Query Execution and skew handling provides a significant advantage. The Craft Demo round tests real-world readiness, including design thinking, documentation quality, and trade-off analysis. SCD Type 2 implementations must be scalable and avoid full table scans. Streaming pipelines require clear justification of technology choices rather than defaulting to complex stacks. Deep knowledge of past projects, including architectural decisions and lessons learned, is critical.

Final Thoughts

The Intuit interview process evaluates far more than the ability to write correct code. It emphasizes scalable system design, clarity of thought, and the ability to explain technical decisions confidently. Candidates with strong fundamentals in DSA, Big Data, and PySpark, combined with hands-on architectural experience, are well positioned to succeed in this process.

If you want, I can now standardize this into your final reusable interview-experience template so future rewrites follow the exact same structure without iteration.